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ABSTRACT 



A review of literature (1984 to 1991) was undertaken to: (1) determine 
the status of research pertaining to the utilization of student evaluations of 
teaching; (2) evaluate the specific studies that pertain to use of evaluations for 
the improvement of science teaching; and (3) examine the instruments 
available for use by science educators. Eighty percent of the studies were 
found to be in higher education, with a large number of these studies 
investigating and generally confirming the reliability and validity of student 
evaluations. Only four percent of the studies concerned science classrooms, 
and there was a lack of research addressing the utility of feedback from 
evaluations for teaching improvement. Few instruments were found that 
are specifically designed for use in science classrooms. Tables summarize the 
results. 
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Objectives 

A review of the literature was undertaken to: (1) determine the status 
of research on student evaluation of teaching in general and in science classes 
specifically; (2) determine the usefulness of this type of research as an 
indication of teacher characteristics, teaching behaviors, and teaching 
methods considered by students to be important in effective science teaching; 
(3) determine the potential use of feedback from student evaluations by the 
science teacher for improvement of teaching; (4) determine the extent of the 
use of student evaluations to provide information for science education 
research; and (5) determine the types of instruments available to science 
teachers and researchers at this time to acquire input from students. This 
paper presents a review of studies pertaining to student evaluation of 
teachers published between 1984 and 1991. The beginning date for this search 
was chosen because it marked the year of publication of a major review article 
by H. W. Marsh in which he described student ratings as useful in research as 
well as for diagnostic feedback because they provide both a process-desaiption 
measure and a product measure. He noted additionally tixat their use in 
research on teaching has been under-utilized. A primary objective of tiiis 
review is to ascertain the current level of use of student evaluations by 
researchers interested in improving the teaching of science. 

Significance 

College students have been evaluating faculty since ihe introduction of 
the first formal published evaluation form, the Purdue Rating Scale of 
Instruction, in 1926 (Darr, 1977). At tiie post-secondary level, student ratings 
are the most common source of data used to evaluate teaching effectiveness, 
distantiy followed by peer ratings, administrative ratings, and instructor self- 
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evaluations. Because of their widespread use, thousands of papers have been 
written on students' evaluations. These papers address the design, develop- 
ment and research pertaining to the evaluation instrument; the validity, 
reliability, generalizabiUty, and potential biases of student ratings; and the 
utility of student evaluations in the improvement of teaching. Marsh (1984) 
reported that the studies provide insight but cannot be easily summarized, 
and that opinions of the role of students' evaluations range from "reliable, 
valid, and useful" to "unreliable, invalid, and useless." Marsh and other 
reviewers (Cohen, 1981; Darr, 1977; Levinson-Rose & Menges, 1981) 
concluded, however, that given an appropriately designed instrument, 
student evaluations are reliable and stable, primarily a function of the 
instructor, valid against a variety of indicators of effective teaching, relatively 
unaffected by potential biases, and useful for improving teaching 
effectiveness. These reviews and others do not, however, examine the 
contribution that student evaluations of teaching can make specitically to the 
understanding and improvement of tiie teaching of science. 

The question of the utility of student evaluations for the improvement 
of science teaching is predicated on ti^e assumption that there is something 
wrong (ineffective) witi^ science teaching as it occurs now. This assumption 
is supported by data from the Third Assessment of Science of the National 
Assessment of Educational Progress (Yager & Penick, 1984), as well as from 
other sources of achievement and attitude data, and has led to an overall 
move toward reform in science education. The current focus of science 
education research on the student, as evidenced by research on the . 
constructivist learning model, gender-bias, misconceptions, attitudes, and use 
of interview methodology, would seem to indicate that educators are giving 
more credence to student opinions about all aspects of their education. 
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Students have a lot to say to science educators about effective teaching. They 
are, after all, professional teacher- watchers, and they know when they are 
learning and can describe characteristics of an effective teacher. The "ideal 
science teacher" as described by students is very nearly the same as that 
described by science education researchers and teachers (Al Methen & 
Wilkinson, 1986; Brekelmans, Wubbels, & Creton, 1990; Tairab & Wilkinson, 
1991). Both groups seem to know "what should be." Students are in a perfect 
position to describe "what is." Science educators need to determine how to 
best elicit information from students and then how to utilize that 
information to improve science education. The use of student evaluations of 
teachers offers one method that can be used at more than just one educational 
level. 

Design and Procedures 

A computer search of material included in the Educational Research 
and Information Clearinghouse (ERIC) data base was conducted, starting with 
the major descriptor, "student-evaluation-of-teacher-performance." The 
references investigated included all of those with this major descriptor 
appearing in ERIC between 1984 and 1991. During those eight years, 589 
references appeared with that descriptor in the ERIC data base. Of these 
references, only those representing original research where student 
evaluation of teaching was of primary importance to the study were 
investigated. Review articles, editorials and opinion papers, and those 
describing an overall evaluation program or evaluations of departments, 
programs, or specific courses were not included for review. Those conducted 
in professional (medical, dental, etc.) or graduate schools were not included 
because of their limited application. This left a total of 167 to be examined. 



Each reference was then categorized according to: (1) educational level 
(elemenury, secondary, higher); (2) specific subject area if included or of 
importance; (3) the focus, goal or purpose of the study; and (4) type or name of 
evaluative instrument used. The categories for focus of study included-. (1) 
instrumentation studies (developing and testing of an instrument, reliability 
and validity studies, bias and halo effects); (2) descriptive studies in which 
student evaluations were used to indicate students' perceptions of teaching 
behaviors or teaching effectiveness; (3) studies in which student evaluations 
were used to access teaching change after an intervention or after feedback 
from a previous rating; (4) studies of teacher attitudes towards use of student 
ratings; (5) studies of student attitude, towards these evaluations; and (6) 
studies relating student achievement, student learning, or cognitive styles to 
student evaluations. Of the total number of studies, only fourteen were 
selected by ERIC search when the second desaiptors ("science instruction," 
science education," or "science teachers") were added, and only eight of these 
met the above aiteria. These studies were investigated furd,er to ascertain 
what types of information they contributed to science education. In addition, 
selected science education research journals were manually searched in order 
to evaluate the reliability of the major descriptor in identifying articles on 
the topic of student evaluations of tiieir teachers. 

Learning environment studies were generally not identified using 
these descriptors and it was not the intention of this assessmem to evaluate 
those studies that appeared in the recent NARST monograph on learning 
environments (Fraser, 1989). These studies investigate the psychosocial 
environment of the classroom, and tiiey naturally include a great deal of 
information on student perceptions of teachers and teaching behavior. This 
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review is limited, however, to instruments and studies designed as student 
evaluations or ratings of teachers and their behaviors. 



Findings 

Findings from the literature search and analysis of selected studies 
identified by the descriptor "student-evaluation-of-teacher performance" can 

be summarized as follows: 

1. About 80% of the studies concern higher education. 

2. The studies are not highly specific according to subject area. A large 
number of studies are conducted in psychology classes. Only 4% of the 
studies were specifically conducted in science classrooms with the 
purpose of improving science education. Other disciplines were also 

represented by few studies. 

3. EvaluaUon instruments are generally not included in articles, although 
sample items may be. Most instruments utilized are researcher- 
designed; some are those routinely used at the university where the 

research was conducted. 

4, Approximately 30% of the studies investigated teacher characteristics 
and teaching styles that resulted in favorable evaluations by students. 

5 Approximately 60% of the studies involved development and testing 
of evaluation instruments, or looking at the reliability and validity of 
swdent evaluations, including the effects of bias on the results. 

6. Approximately 4% of the studies investigated a.e use of student 
evaluations for improvement of teaching, using evaluation feedback as 

rationale for change. 

7. The remaining categories also received relatively litUe attention. 
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The studies targeting science education were conducted more 
frequently in secondary classrooms, and several presented the results of 
evaluation of teachers known to be exemplary. One new instrument that was 
constructed for use in science classrooms, the Science Student Perception 
Questiom^aire (SSPQ), was used in several studies, having particular 
application to the evaluation of student teachers by students. Several 
important studies relating to student evaluations of science teachers were not 
identified through ERIC using the descriptor, but were found by manual 
searches. These articles related to student evaluations of teachers, although 
the terminology used was "student perceptions" of teachers and their 
behavior. The term "evaluation" may carry some negative comiotations 
such that authors do not utilize the term in designating ERIC descriptors. To 
use "student evaluation of teacher performance" as a descriptor would clearly 
help other interested researchers. 

Conclusions 

In order to determine the usefulness of student evaluations, one must 
be assured of their reliability (that they are dependable measures of what goes 
on in a classroom, measured by their internal consistency, interrater 
agreement, stability, and generalizability), their validity (that they are an 
accurate measure of teaching effectiveness), and that they are not biased. 
Much of tixe literature still focuses on these issues, primarily at the coUege 
level. Results generally support the claim that student evaluations are 
reliable and valid indicators of effective teaching when an appropriately 
designed instrument is used. Teachers, however, appear to dismiss the 
usefulness of student evaluations perhaps because of the large number of 
studies that report context-dependent bias, which appears to be of minor 
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overall importance. While there is evidence that some variables, teacher 
personality for example, do influence student ratings slightly, it is generally 
suggested that these may actually relate to the teaching effectiveness of the 
instructor as perceived by the individual student, and as such are valid (Jones, 
1989). One very interesting study was that of Runco & Thurston (1987), who 
used a method of social validation to construct an instrument based on 
students' ideas of effective teaching and found that it corresponded well to 
the evaluation form used by the university. 

There is more limited research of this type at the elementary and 
secondary levels, and it is unclear whether student maturity would preclude 
the use of these kinds of data. Data using the evaluation form IDEA-H 
would suggest that secondary students are able to give reliable and valid 
evaluations of their teachers (Aubrecht, Hanna, & Hoyt, 1986), and research 
with the SSPQ supports this contention (Jegede, 1989; Tairab & Wilkinson, 
1991). Some authors have found that elementary students are reliable raters 
of teaching behavior (DriscoU et al., 1985; Kronowitz, 1984), and one rating 
form was identified, the Primary Grade Pupil Report (Driscoll et al.). Payne 
(1984), on the other hand, found elementary students did not give valid 
evaluations of student-teachers' performance. While this line of research at 
the college level is beconung somewhat redundant, there is much that needs 
to be done to ascertain the reliability and validity of student ratings at the 
secondary and elementary levels. Evaluations appear to be fairly accurate 
measures of classroom events and, as such, should be important indicators of 
behaviors that students at all levels perceive as important to their learning. 

The lack of research on the potential of using student evaluations of 
teaching specifically in the science classroom is cause for concern. Science 
classes have special characteristics that may have an influence on the type of 
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evaluaUons utilized and uies of the evaluations. For example, science classes 
have laboratories; science courses are perceived by many students as being 
especially difficult; science teachers have traditionally required learning a 
large body of factual information; and ttiere are substantial differences 
between the intended, translated, and achieved curriculum. Student 
perceptions of teachers should be of concern to educators and the various 
forms of student evaluations of teachers and teaching could play an 
important role in improving science education. Although students have 
been asked to give their opinions and perceptions of their classroom and 
teachers, few instruments have been constructed that are specifically designed 
for use with science students. The construction and validation of the Science 
Students Perception Questionnaire should have considerable usefulness in 
providing relevan<: feedback to science teachers as well as providing 
information for science education resiarchers. The challenge to educators is 
to discover how to use evaluation information to improve science teaching. 
Marsh (1984) appreciated the potential use of student evaluations in research 
on teaching, but there is littie evidence of research pctivity in this area. 

One area of research tixai is needed is to study the effects of active 
intervention (such as consultations) for the purpose of improvement of 
teaching skills on the future evaluations of those teachers. An interesting 
study by DeNeve (1951) looked at this problem, especially as it relates to tiie 
instructor's own subjective theory of lecturing. He proposed a model that 
states that instructors consider changing teaching behavior following an 
evaluation only if the change supports the instructor's own subjective theory. 
The model supports tiie use of feedback with consultation. This line of 
research could be most exciting, espenally as it relates to preservice training 
and the development of one's own theory of teaching. 
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Some studies indicate that student satisfaction, involvement, and 
motivation may be enhanced if they were asked to contribute their opinions 
for the purpose of improved instruction, and this process could affect student 
behavior as well as academic performance. However, there are very few 
studies of this type. 

Many researchers and practitioners would also be interested in the 
correlation of intervention with achievement of students in those classes. 
Entwistle & Tait (1990) conducted a study to investigate the relationship 
between students' approaches to learning and their evaluations of teaching. 
They found that the way a student defines good teaching depends on the 
depth of the approach to learning of that student. They noted four 
orientations to learning: (1) meaning orientation, a deep approach, wl*h 
internal motivation; (2) achieving orientation, a strategic approach with 
achievement as a goal; (3) reproducing orientation, a surface approach with a 
fear of failure; and (4) non-academic ori'sntation, witli low self-confidence and 
negative attitudes. It is necessary to add items to the typical evaluation that 
get at those differences. This is another study that seems to be at the forefront 
of research, using student evaluations as a way to develop models of teaching 
and learning and of understanding that can lead to overall improvement of 
instruction. 

The studies that appear to hold considerable promise in not only 
improving teaching but also student learning, are those studies in which the 
evaluation items are linked directly to a clearly conceptualized and explained 
model of student learning and the teaching needed to promote learning. 
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Data Set for This Review 



Year of Publication 


|1984 


|1985 


1986 


1987 


1988 


1989 


|1990 


1991 


Number of references identified by 


85 


76 


77 


94 


78 


61 


71 


47 


ERIC descriptor 












Number of those meeting selection 


21 


19 


25 


23 


19 


22 


18 


20 


criteria 














Number of references identified by 


0 


0 


2 


1 


1 


3 


3 


0 


manual search 
















Total references available for review 


21 


18 


23 


22 


19 


23 


20 


19 


Distribution by education level: 


















A. Higher 


19 


13 


14 


20 


13 


17 


12 


16 


B. Secondary 


0 


3 


7 


1 


4 


5 


6 


2 


C. Elementary 


2 


1 


1 


0 


2 


0 


1 


0 


D. Combmation 


0 


1 


1 


1 


0 


1 


1 


1 


Distribution by type of study: 


















A. Instrumentation studies 


15 


11 


10 


14 


13 


10 


11 


14 


B. Descriptions of teacher behaviors 


6 


4 


10 


4 


6 


11 


8 


4 


identified by SET 














v_. oruuies or teacner cnange 


1 


0 


2 


0 


1 


1 


0 


2 


measured by SET 
















D. Studies of teacher attitudes 


1 


1 


2 


1 


0 


1 


0 


2 


toward SET 














E. Studies of student attitudes 


1 


1 


1 


2 


0 


0 


1 


0 


toward SET 










F. Studies of learning and SET 


1 


4 


1 


2 


1 


4 


3 


2 



NOTE: The totals do not add up to the number of studies because some studies were 
assigned to more than one category. 
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